# **Exercise Session 2**

Pipelining, Static Branch Prediction, Dynamic Branch Prediction

#### Advanced Computer Architectures

Politecnico di Milano March 12th, 2025

Alessandro Verosimile <alessandro.verosimile@polimi.it>





#### Recall: Pipeline performance

Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls

Ideal pipeline CPI: measure of the maximum performance attainable by the implementation

**Structural hazards**: HW cannot support this combination of instructions

Data hazards: Instruction depends on result of prior instruction still in the pipeline

Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches, jumps, exceptions)





#### Recall: Three Classes of Hazards

Structural Hazards: Attempt to use the same resource from different instructions simultaneously Example: Single memory for instructions and data

Data Hazards: Attempt to use a result before it is ready Example: Instruction depending on a result of a previous instruction still in the pipeline

Control Hazards: Attempt to make a decision on the next instruction to execute before the condition is evaluated Example: Conditional branch execution





#### Recall: Data Hazards possible solutions







sub \$2, \$1, \$3 add \$4, \$10, \$11 and \$7, \$8, \$9 lw \$16, 100(\$18) lw \$17, 200(\$19) and \$12, \$2, \$5 or \$13, \$6, \$2 add \$14, \$2, \$2 sw \$15,100(\$2)



#### Recall: Data Hazards possible solutions



sub \$2, \$1, \$3 and \$12, \$2, \$5 or \$13, \$6, \$2 add \$14, \$2, \$2 sw \$15,100(\$2) add \$4, \$10, \$11 and \$7, \$8, \$9 lw \$16, 100(\$18) lw \$17, 200(\$19) sub \$2, \$1, \$3 add \$4, \$10, \$11 and \$7, \$8, \$9 lw \$16, 100(\$18) lw \$17, 200(\$19) and \$12, \$2, \$5 or \$13, \$6, \$2 add \$14, \$2, \$2 sw \$15,100(\$2)





# Recall: Pipelining and Forwarding





- MEM/EX path
- MEM/ID path
- MEM/MEM path













i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)







i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

i5: beq \$t0, \$t2, 0x0089

| IF<br>Instruction Fetch | ID<br>Instruction Decode | EX<br>Execution | ME<br>Memory Access                     | WB<br>Write Back |  |
|-------------------------|--------------------------|-----------------|-----------------------------------------|------------------|--|
|                         |                          |                 | , , , , , , , , , , , , , , , , , , , , |                  |  |

#### ALU Instructions: op \$x,\$y,\$z

| Instr. Fetch | Read of Source    | ALU Op.      | Write Back         |
|--------------|-------------------|--------------|--------------------|
| & PC Increm. | Regs. \$y and \$z | (\$y op \$z) | Destinat. Reg. \$x |

#### Load Instructions: lw \$x,offset(\$y)

| Instr. Fetch | Read of Base | ALU Op.      | Read Mem.     | Write Back         |
|--------------|--------------|--------------|---------------|--------------------|
| & PC Increm. | Reg. \$y     | (\$y+offset) | M(\$y+offset) | Destinat. Reg. \$x |

#### Store Instructions: sw \$x,offset(\$y)

| Instr. Fetch | Read of Base Reg. | ALU Op.      | Write Mem.    |  |
|--------------|-------------------|--------------|---------------|--|
| & PC Increm. | \$y & Source \$x  | (\$y+offset) | M(\$y+offset) |  |

#### Conditional Branches: beq \$x,\$y,offset

| Instr. Fetch | Read of Source | ALU Op. (\$x-\$y) | Write |
|--------------|----------------|-------------------|-------|
| & PC Increm. |                | &(PC+4+offset)    |       |







i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

i5: beq \$t0, \$t2, 0x0089

- No forwarding paths

- RF access R/W optimization

- Control Hazard solved in ID







i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

- No forwarding paths
- RF access R/W optimization
- Control Hazard solved in ID
- 1) Define all conflicts/dependencies. For each of them indicate whether it causes an hazard and the theoretical amount of stalls
- 2) Draw the effective pipeline schema
- Assuming EX/EX, MEM/EX, and MEM/MEM forwarding paths available + 2)
- 4) Assuming EX/ID + 3)





i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

| Instr. # | Dependency<br>on Instr. # | Register involved | Hazard<br>(yes/no) | # of stalls<br>(theoretical) |
|----------|---------------------------|-------------------|--------------------|------------------------------|
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |





i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

| Instr. # | Dependency<br>on Instr. # | Register involved | Hazard<br>(yes/no) | # of stalls<br>(theoretical) |
|----------|---------------------------|-------------------|--------------------|------------------------------|
| i2       | i1                        | \$t1              | yes                | 2                            |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |





i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

| Instr. # | Dependency<br>on Instr. # | Register involved | Hazard<br>(yes/no) | # of stalls<br>(theoretical) |
|----------|---------------------------|-------------------|--------------------|------------------------------|
| i2       | i1                        | \$t1              | yes                | 2                            |
| i3       | i2                        | \$t2              | yes                | 2                            |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |





i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

| Instr. # | Dependency<br>on Instr. # | Register involved | Hazard<br>(yes/no) | # of stalls<br>(theoretical) |
|----------|---------------------------|-------------------|--------------------|------------------------------|
| i2       | i1                        | \$t1              | yes                | 2                            |
| i3       | i2                        | \$t2              | yes                | 2                            |
| i4       | i2                        | \$t2              | yes                | 1                            |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |





i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

| Instr. # | Dependency<br>on Instr. # | Register involved | Hazard<br>(yes/no) | # of stalls<br>(theoretical) |
|----------|---------------------------|-------------------|--------------------|------------------------------|
| i2       | i1                        | \$t1              | yes                | 2                            |
| i3       | i2                        | \$t2              | yes                | 2                            |
| i4       | i2                        | \$t2              | yes                | 1                            |
| i4       | i3                        | \$tO              | yes                | 2                            |
|          |                           |                   |                    |                              |
|          |                           |                   |                    |                              |





i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

| Instr. # | Dependency<br>on Instr. # | Register involved | Hazard<br>(yes/no) | # of stalls<br>(theoretical) |
|----------|---------------------------|-------------------|--------------------|------------------------------|
| i2       | i1                        | \$t1              | yes                | 2                            |
| i3       | i2                        | \$t2              | yes                | 2                            |
| i4       | i2                        | \$t2              | yes                | 1                            |
| i4       | i3                        | \$tO              | yes                | 2                            |
| i5       | i2                        | \$t2              | no                 | 0                            |
|          |                           |                   |                    |                              |





i1: add \$t1, \$t0, \$t1

i2: add \$t2, \$t1, \$t2

i3: subi \$t0, \$t2, 1

i4: sw \$t0, 0x00BB(\$t2)

| Instr. # | Dependency<br>on Instr. # | Register involved | Hazard<br>(yes/no) | # of stalls<br>(theoretical) |
|----------|---------------------------|-------------------|--------------------|------------------------------|
| i2       | i1                        | \$t1              | yes                | 2                            |
| i3       | i2                        | \$t2              | yes                | 2                            |
| i4       | i2                        | \$t2              | yes                | 1                            |
| i4       | i3                        | \$tO              | yes                | 2                            |
| i5       | i2                        | \$t2              | no                 | 0                            |
| i5       | i3                        | \$tO              | yes                | 1                            |







|   | Instruction            | <b>C</b> 1 | C2 | <b>C</b> 3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|------------|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   |            |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |





#### CC<sub>1</sub>

|   | Instruction            | <b>C</b> 1 | C2 | C3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|----|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | F          |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |





|   | Instruction            | C1 | C2 | <b>C</b> 3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|----|----|------------|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF | ID |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |    | IF |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |    |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |    |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |    |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |





|   | Instruction            | <b>C</b> 1 | C2 | СЗ    | C4 | <b>C</b> 5 | C6 | <b>C7</b> | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|-------|----|------------|----|-----------|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | F          | D  | EX    |    |            |    |           |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID(s) |    |            |    |           |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF(s) |    |            |    |           |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |       |    |            |    |           |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |       |    |            |    |           |    |    |     |     |     |     |     |     |     |





|   | Instruction            | C1 | C2 | <b>C</b> 3 | C4    | <b>C</b> 5 | C6 | <b>C7</b> | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|----|----|------------|-------|------------|----|-----------|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF | ID | EX         | М     |            |    |           |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |    | IF | ID(s)      | ID(s) |            |    |           |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |    |    | IF(s)      | IF(s) |            |    |           |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |    |    |            |       |            |    |           |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |    |    |            |       |            |    |           |    |    |     |     |     |     |     |     |     |





|   | Instruction            | C1 | C2 | СЗ    | C4    | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|----|----|-------|-------|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF | ID | EX    | М     | WB         |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |    | IF | ID(s) | ID(s) | ID         |    |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |    |    | IF(s) | IF(s) | IF         |    |    |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |    |    |       |       |            |    |    |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |    |    |       |       |            |    |    |    |    |     |     |     |     |     |     |     |





|   | Instruction            | C1 | C2 | <b>C</b> 3 | C4    | <b>C</b> 5 | C6    | <b>C7</b> | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|----|----|------------|-------|------------|-------|-----------|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF | ID | EX         | М     | WB         |       |           |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |    | IF | ID(s)      | ID(s) | ID         | EX    |           |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |    |    | IF(s)      | IF(s) | IF         | ID(s) |           |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |    |    |            |       |            | IF(s) |           |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |    |    |            |       |            |       |           |    |    |     |     |     |     |     |     |     |





|   | Instruction            | C1 | C2 | <b>C</b> 3 | C4    | <b>C</b> 5 | C6    | С7    | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|----|----|------------|-------|------------|-------|-------|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF | ID | EX         | М     | WB         |       |       |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |    | IF | ID(s)      | ID(s) | ID         | EX    | М     | WB |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |    |    | IF(s)      | IF(s) | IF         | ID(s) | ID(s) | ID |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |    |    |            |       |            | IF(s) | IF(s) | IF |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |    |    |            |       |            |       |       |    |    |     |     |     |     |     |     |     |





|   | Instruction            | C1 | C2 | СЗ    | C4    | <b>C</b> 5 | C6    | С7    | C8 | C9    | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|----|----|-------|-------|------------|-------|-------|----|-------|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF | ID | EX    | М     | WB         |       |       |    |       |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |    | IF | ID(s) | ID(s) | ID         | EX    | М     | WB |       |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |    |    | IF(s) | IF(s) | IF         | ID(s) | ID(s) | ID | EX    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |    |    |       |       |            | IF(s) | IF(s) | IF | ID(s) |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |    |    |       |       |            |       |       |    | IF(s) |     |     |     |     |     |     |     |





|   | Instruction            | <b>C</b> 1 | C2 | <b>C</b> 3 | C4    | <b>C</b> 5 | C6    | <b>C7</b> | C8 | C9    | C10   | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|------------|-------|------------|-------|-----------|----|-------|-------|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | F          | ID | EX         | М     | WB         |       |           |    |       |       |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID(s)      | ID(s) | ID         | EX    | М         | WB |       |       |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF(s)      | IF(s) | IF         | ID(s) | ID(s)     | ID | EX    | М     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |            |       |            | IF(s) | IF(s)     | IF | ID(s) | ID(s) |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |            |       |            |       | ,         |    | IF(s) | IF(s) |     |     |     |     |     |     |





|   | Instruction            | <b>C</b> 1 | C2 | <b>C</b> 3 | C4    | <b>C</b> 5 | C6    | С7    | C8 | C9    | C10   | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|------------|-------|------------|-------|-------|----|-------|-------|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF         | ID | EX         | М     | WB         |       |       |    |       |       |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID(s)      | ID(s) | ID         | EX    | М     | WB |       |       |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF(s)      | IF(s) | IF         | ID(s) | ID(s) | ID | EX    | М     | WB  |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |            |       |            | IF(s) | IF(s) | IF | ID(s) | ID(s) | ID  |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |            |       |            |       |       |    |       | IF(s) | IF  |     |     |     |     |     |





|   | Instruction            | <b>C</b> 1 | C2 | С3    | C4    | <b>C</b> 5 | C6    | С7    | C8 | C9    | C10   | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|-------|-------|------------|-------|-------|----|-------|-------|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | F          | ID | EX    | М     | WB         |       |       |    |       |       |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID(s) | ID(s) | ID         | EX    | М     | WB |       |       |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF(s) | IF(s) | IF         | ID(s) | ID(s) | ID | EX    | М     | WB  |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |       |       |            | IF(s) | IF(s) | IF | ID(s) | ID(s) | ID  | EX  | М   | WB  |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |       |       |            |       |       |    | IF(s) | IF(s) | IF  | ID  | EX  | М   | WB  |     |





|   | Instruction            | <b>C</b> 1 | C2 | С3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|----|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |





#### Recall MIPS with Forwarding



- MEM/EX path
- MEM/ID path
- MEM/MEM path







|   | Instruction            | C1 | C2 | С3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|----|----|----|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF | ID | EX | М  | WB         |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |    |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |    |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |    |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |    |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |





|   | Instruction            | <b>C</b> 1 | C2 | С3 | C4 | <b>C</b> 5 | C6 | <b>C7</b> | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|----|----|------------|----|-----------|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | Ŧ          | ID | EX | М  | WB         |    |           |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID | EX | М          | WB |           |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    |    |    |            |    |           |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |    |    |            |    |           |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |    |    |            |    |           |    |    |     |     |     |     |     |     |     |





|   | Instruction            | C1 | C2 | СЗ | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|----|----|----|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF | ID | EX | М  | WB         |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |    | IF | ID | EX | М          | WB |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |    |    | IF | ID | EX         | М  | WB |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |    |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |    |    |    |    |            |    |    |    |    |     |     |     |     |     |     |     |





|   | Instruction            | <b>C</b> 1 | C2 | <b>C</b> 3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|------------|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF         | ID | EX         | М  | WB         |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID         | EX | М          | WB |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF         | ID | ĒΧ         | М  | WB |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |            | IF | ID         | EX | М  | WB |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |





# Exe 3.3: Forwarding Paths



|   | Instruction            | <b>C</b> 1 | C2 | <b>C</b> 3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|------------|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF         | ID | EX         | М  | WB         |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID         | EX | М          | WB |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF         | ID | ĒΧ         | M、 | WB |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |            | IF | ID         | EX | М  | WB |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |





# Exe 3.3: Forwarding Paths



|   | Instruction            | <b>C</b> 1 | C2 | С3 | C4 | <b>C</b> 5 | C6    | <b>C7</b> | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|----|----|------------|-------|-----------|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | ΞF         | ID | EX | М  | WB         |       |           |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID | EX | М          | WB    |           |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF | ID | ĒΧ         | M、    | WB        |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |    | IF | ID         | EX    | М         | WB |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |    |    | IF         | ID(s) | ID        | EX | М  | WB  |     |     |     |     |     |     |





# Exe 3.4: Forwarding Paths + EX/ID



|   | Instruction            | <b>C1</b> | C2 | <b>C</b> 3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|-----------|----|------------|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   |           |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |           |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |           |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |           |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |           |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |





# Exe 3.4: Forwarding Paths + EX/ID



|   | Instruction            | <b>C</b> 1 | C2 | <b>C</b> 3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|------------|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF         | ID | EX         | М  | WB         |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID         | EX | М          | WB |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF         | ID | ĒΧ         | M、 | WB |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |            | IF | ID         | EX | М  | WB |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |            |    |            |    |    |    |    |     |     |     |     |     |     |     |





# Exe 3.4: Forwarding Paths + EX/ID



|   | Instruction            | <b>C1</b> | C2 | С3 | C4 | <b>C</b> 5 | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|-----------|----|----|----|------------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF        | ID | EX | М  | WB         |    |    |    |    |     |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |           | IF | ID | EX | М          | WB |    |    |    |     |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |           |    | IF | ID | EX.        | М、 | WB |    |    |     |     |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |           |    |    | IF | ID         | EX | М  | WB |    |     |     |     |     |     |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |           |    |    |    | IF         | ID | EX | М  | WB |     |     |     |     |     |     |     |





#### Recall: Three Classes of Hazards

Structural Hazards: Attempt to use the same resource from different instructions simultaneously Example: Single memory for instructions and data

Data Hazards: Attempt to use a result before it is ready Example: Instruction depending on a result of a previous instruction still in the pipeline

Control Hazards: Attempt to make a decision on the next instruction to execute before the condition is evaluated *Example:* Conditional branch execution





### Recall: Static Branch Prediction Techniques

**Branch Always Not Taken (Predicted-Not-Taken)** 

**Branch Always Taken (Predicted-Taken)** 

**Backward Taken Forward Not Taken (BTFNT)** 



**Delayed Branch** 





# Exe 3.4 : Pipeline Schema+Static BP



#### CC 15

|   | Instruction            | <b>C</b> 1 | C2 | <b>C</b> 3 | C4    | <b>C</b> 5 | C6    | <b>C7</b> | C8 | C9    | C10   | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|------------|-------|------------|-------|-----------|----|-------|-------|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF         | ID | EX         | М     | WB         |       |           |    |       |       |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID(s)      | ID(s) | ID         | EX    | М         | WB |       |       |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF(s)      | IF(s) | IF         | ID(s) | ID(s)     | ID | EX    | М     | WB  |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |            |       |            | IF(s) | IF(s)     | IF | ID(s) | ID(s) | ID  | EX  | М   | WB  |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |            |       |            |       |           |    | IF(s) | IF(s) | IF  | ID  | EX  | М   | WB  |     |
| 6 | NEW INSTRUCTION        |            |    |            |       |            |       |           |    |       |       |     |     |     |     |     |     |





# Exe 3.4 : Pipeline Schema+Static BP



#### CC 15

|   | Instruction            | <b>C</b> 1 | C2 | <b>C</b> 3 | C4    | <b>C</b> 5 | C6    | <b>C7</b> | C8 | C9    | C10   | C11 | C12 | C13 | C14 | C15 | C16 |
|---|------------------------|------------|----|------------|-------|------------|-------|-----------|----|-------|-------|-----|-----|-----|-----|-----|-----|
| 1 | add \$t1, \$t0, \$t1   | IF         | ID | EX         | М     | WB         |       |           |    |       |       |     |     |     |     |     |     |
| 2 | add \$t2, \$t1, \$t2   |            | IF | ID(s)      | ID(s) | ID         | EX    | М         | WB |       |       |     |     |     |     |     |     |
| 3 | subi \$t0, \$t2, 1     |            |    | IF(s)      | IF(s) | IF         | ID(s) | ID(s)     | ID | EX    | М     | WB  |     |     |     |     |     |
| 4 | sw \$t0, 0x00BB(\$t2)  |            |    |            |       |            | IF(s) | IF(s)     | IF | ID(s) | ID(s) | ID  | EX  | М   | WB  |     |     |
| 5 | beq \$t0, \$t2, 0x0089 |            |    |            |       |            |       |           |    | IF(s) | IF(s) | IF  | ID  | EX  | М   | WB  |     |
| 6 | NEW INSTRUCTION        |            |    |            |       |            |       |           |    |       |       |     |     |     |     |     |     |

#### Conditional Branches: beq \$x,\$y,offset

| Instr. Fetch | Read of Source    | ALU Op. (\$x-\$y) | Write |  |
|--------------|-------------------|-------------------|-------|--|
| & PC Increm. | Regs. \$x and \$y | &(PC+4+offset)    | PC    |  |







#### Recall: Three Classes of Hazards

Structural Hazards: Attempt to use the same resource from different instructions simultaneously Example: Single memory for instructions and data

Data Hazards: Attempt to use a result before it is ready Example: Instruction depending on a result of a previous instruction still in the pipeline

Control Hazards: Attempt to make a decision on the next instruction to execute before the condition is evaluated *Example:* Conditional branch execution





### Recall: Dynamic Branch Prediction



### Dynamic Branch Predictor

 Describe (the answer has to be effectively supported) a 1-BHT and a 2-BHT able to execute the following assembly code (R0 is set to 2000, R1 is set to 0)

| LOOP:  | LD    | F1   | 0    | R0  |
|--------|-------|------|------|-----|
|        | ADDD  | F2   | F1   | F1  |
|        | ADDI  | R1   | R1   | 100 |
| LOOP2: | MULTE | ) F2 | F2   | F1  |
|        | SUBI  | R1   | R1   | 1   |
|        | BNEZ  | R1   | LOOP | 2   |
|        | SUBI  | R0   | R0   | 2   |
|        | BNEZ  | R0   | LOOP |     |

 The obtained result, in terms of mispredictions, is inline with theoretical characteristics of the two predictors? Please effectively support your answer.



#### A First Consideration

LOOP: R<sub>0</sub> LD F2 **ADDD ADDI R1 R1** 100 **MULTD** LOOP2: F2 F2 **F**1 **SUBI R1 R**1 **R1 LOOP2 BNEZ SUBI** R<sub>0</sub> R<sub>0</sub> **R0 LOOP** BNEZ





### How many iterations?

LOOP: LD F1 0 R0 ADDD F2 F1 F1

ADDI R1 R1 100

LOOP2: MULTD F2 F2 F1

SUBI R1 R1 1

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ R0 LOOP

R0 is set to 2000 R1 is set to 0





### How many iterations?

R0 is set to 2000 LOOP: LD F1 0 R0 R1 is set to 0

ADDI R1 R1 100

F2

LOOP2: MULTD F2 F2 F1 LOOP2

**ADDD** 

SUBI R1 R1 1

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ R0 LOOP



@T0 100 iterations

# How many iterations?

|        |       |         |      |     | R0 is set to 2000  |
|--------|-------|---------|------|-----|--------------------|
| LOOP:  | LD    | F1      | 0    | R0  | R1 is set to 0     |
|        | ADDD  | F2      | F1   | F1  |                    |
|        | ADDI  | R1      | R1   | 100 |                    |
| LOOP2: | MULTD | F2      | F2   | F1  | LOOP2              |
|        | SUBI  | R1      | R1   | 1   | @T0 100 iterations |
|        | BNEZ  | R1 L    | OOP2 |     |                    |
|        | SUBI  | R0      | R0   | 2   | LOOP               |
|        | BNEZ  | R0 LOOP |      |     | 1000 iterations    |





#### 1bit - BHT

LOOP: LD F1 0 R0 **ADDD** F2 F1 **F1 ADDI R1 R1** 100 LOOP2: **MULTD** F2 F2 F1 **SUBI R1 R1 BNEZ R1** LOOP2 **SUBI** R0 2 R0 **LOOP BNEZ** R0



R0 is set to 2000





#### 1bit - BHT

LOOP: LD F1 R<sub>0</sub> 0 **ADDD** F2 F1 **F1 ADDI R1 R1** 100 LOOP2: **MULTD** F2 F2 F1 **SUBI R1 R1 BNEZ R1** LOOP2 R0**SUBI** R0LOOP **BNEZ** R<sub>0</sub>



R0 is set to 2000



k-bit Branch Address: Collide Not collide





R<sub>0</sub>

**F1** 

LOOP: LD F1 0 ADDD F2 F1

ADDI R1 R1 100

LOOP2: MULTD F2 F2 F1

SUBI R1 R1 1

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ R0 LOOP

R0 is set to 2000 R1 is set to 0



Let us consider that the branch addresses do not collide

**1-BHT** 

LOOP: T LOOP2: T **1-BHT** 

T NT 1-BHT

NT T NT NT

1-BHT









# Let us consider that the branch addresses do not collide

LOOP2 100 iterations

LOOP 1000 iterations





1 + (1000-1) \* 2









# Let us consider that the branch addresses do not collide

LOOP2 100 iterations







LOOP: 0 R<sub>0</sub> LD F1 **ADDD** F2 F1 F1 **ADDI R1 R1** 100 LOOP2: **MULTD F**2 F2 F1 **SUBI** R1 **R1 BNF**Z LOOP2 R1 SUBI R0R0

R<sub>0</sub>

**LOOP** 



R0 is set to 2000

# Let us consider that the branch addresses do not collide

**BNEZ** 

LOOP2 100 iterations







LOOP: 0 R<sub>0</sub> LD F1 **ADDD** F2 F1 F1 **ADDI R1 R1** 100 LOOP2: **MULTD F**2 F2 F1 **SUBI** R1 **R1 BNF**Z LOOP2 R1 SUBI R0R0

R<sub>0</sub>

**LOOP** 



R0 is set to 2000

# Let us consider that the branch addresses do not collide

**BNEZ** 

LOOP2 100 iterations







#### 1bit - BHT - Collision

LOOP: LD

F1 **ADDD** F2 0 F1 R<sub>0</sub>

**F**1

**ADDI** 

F2

**R1** 

**R1** F2 100

LOOP2: **MULTD** 

**BNEZ** 

F1

**SUBI** R1 **R1** 

BNF7 R1 LOOP2

SUBI R0 R0

R0

**LOOP** 

R0 is set to 2000 R1 is set to 0



#### Let us consider that the branch addresses do collide

LOOP2 100 iterations 1-BHT

1-BHT







LOOP: F1 0 R<sub>0</sub> LD **ADDD** F2 F1 F1 **ADDI R1 R1** 100 LOOP2: **MULTD** F2 F2 F1 **SUBI** R1 **R1 BNF**Z R1 LOOP2

SUBI RO RO 2

BNEZ R0 LOOP

R0 is set to 2000 R1 is set to 0



# Let us consider that the branch addresses do collide

LOOP2 100 iterations





(1+1) \* (1000-1) + 1





LOOP: F1 0 R<sub>0</sub> LD **ADDD** F2 F1 F1 **ADDI R1 R1** 100 LOOP2: **MULTD** F2 F2 F1 **SUBI** R1 **R1** BNF7 R1 LOOP2 SUBI R0R0

R<sub>0</sub>

**LOOP** 



R0 is set to 2000

# Let us consider that the branch addresses do collide

**BNEZ** 

LOOP2 100 iterations







LOOP: LD F1 0 R0 ADDD F2 F1 F1

ADDI R1 R1 100

LOOP2: MULTD F2 F2 F1
SUBI R1 R1 1

BNEZ R1 LOOP2

SUBI RO RO 2

BNEZ R0 LOOP

R0 is set to 2000 R1 is set to 0



Let us consider that the branch addresses do not collide

LOOP: 11 LOOP2: 11





LOOP: LD F1 0 R0 ADDD F2 F1 F1 ADDI R1 R1 100

LOOP2: MULTD F2 F2 F1
SUBI R1 R1 1

BNEZ R1 LOOP2

SUBI RO RO 2

BNEZ R0 LOOP

R0 is set to 2000 R1 is set to 0



Let us consider that the branch addresses do not collide







LOOP: LD F1 0 R0
ADDD F2 F1 F1
ADDI R1 R1 100

LOOP2: MULTD F2 F2 F1
SUBI R1 R1 1

BNEZ R1 LOOP2

SUBI RO RO 2

BNEZ R0 LOOP

R0 is set to 2000 R1 is set to 0



Let us consider that the branch addresses do not collide







LOOP: LD F1 0 R0 ADDD F2 F1 F1

ADDI R1 R1 100

LOOP2: MULTD F2 F2 F1

SUBI R1 R1 BNEZ R1 LOOP2

SUBI RO RO 2

BNEZ R0 LOOP

R0 is set to 2000 R1 is set to 0



Let us consider that the branch addresses do collide







LOOP: LD F1 0 R0 ADDD F2 F1 F1

ADDI R1 R1 100 LOOP2: MULTD F2 F2 F1

SUBI R1 R1 1

BNEZ R1 LOOP2 SUBI R0 R0

BNEZ RO LOOP

R0 is set to 2000 R1 is set to 0



Let us consider that the branch addresses do collide







# Thank you for your attention Questions?

Alessandro Verosimile <alessandro.verosimile@polimi.it>

#### **Acknowledgements**

Davide Conficconi, E. Del Sozzo, Marco D. Santambrogio, D. Sciuto Part of this material comes from:

- "Computer Organization and Design" and "Computer Architecture A Quantitative Approach" Patterson and Hennessy books
- News and paper cited throughout the lecture

and are properties of their respective owners



